Exploiting the AltiVec Unit for Commercial Applications

نویسندگان

  • Daniel Citron
  • Hiroshi Inoue
  • Takao Moriyama
  • Motohiro Kawahito
  • Hideaki Komatsu
  • Toshio Nakatani
چکیده

The introduction of the PowerPC 970 JS20 blade server opens opportunities for vectorizing commercial applications using the integrated AltiVec unit. We examined the vectorization of applications from diverse fields such as XML parsing, UTF-8 encoding, life sciences, string manipulations, and sorting. We obtained performance speedups (over optimized scalar code) for string comparisons (2-3), XML delimiter lookup (1.5-5), and UTF-8 conversion (2-4). The focus of this paper is on the process rather than on the results. Vectorizing commercial applications vastly differs from vectorizing graphic and image processing applications. In addition to the results achieved, we describe the pitfalls encountered, the advantages and disadvantages of the AltiVec unit, and what is missing in its current implementation. Sorting presents an interesting example. Vectorizing the quicksort algorithm was not successful due to low parallelism and misaligned data accesses. Vectorization of the combsort algorithm was very successful, with speedups of 5.0, until the data spilled from the L2 cache. Combining both approaches, by first partitioning the input using quicksort and then continuing with combsort, yielded speedups of over 2.0. This research led to several patent disclosures, many algorithmic enhancements, and an insight into the correct integration of software with the AltiVec unit. The wealth of information collected during this study is being conveyed to the auto-vectorization teams of the relevant compilers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AltiVec Vectorizes PowerPC: 5/11/98

Better late than never. Apple, IBM, and Motorola have defined a set of multimedia extensions to the basic PowerPC instruction set, making it the last of the six general-purpose architectures to incorporate such a feature. The first processor to use the new AltiVec extensions will be the G4, due to ship in systems in 1H99. If the G4 ships on schedule, it will be more than two years behind Intel’...

متن کامل

Exploiting the Data-level Parallelism in Modern Microprocessors for Neural Network Simulation

Fast SIMD-parallel execution units are available in most modern microprocessors. They provide an internal parallelism degree in the range from 2 to 16 and can accelerate many data-parallel algorithms. In this paper the suitability of ve diierent SIMD units (Intel's MMX and SSE, AMD's 3DNow!, Motorola's AltiVec and Sun's VIS) for the simulation of neural networks is compared. The appropriateness...

متن کامل

G4 Is First PowerPC With AltiVec: 11/16/98

Motorola will extend its PowerPC line with the first G4 processor core, which it unveiled at last month’s Microprocessor Forum. According to project leader Paul Reed, the new core adds a faster floating-point unit to the older G3 core and doubles its cache and system-bus bandwidth. The G4 is also the first chip to incorporate the AltiVec extensions, which greatly increase performance on many ba...

متن کامل

Altivec Vector Unit Customization for Embedded Systems

Vector extensions for general purpose processors are an efficient feature to address the growing performance demand of multimedia and computer vision applications. Embedded processors are the most widespread architectures for such applications. While provid ing sufficient computing power for these applications, they must take into account power, area and realtime constraints. In this paper, we ...

متن کامل

On the Efficiency of Reductions in p-SIMD Media Extensions

Many important multimedia applications contain a signifcant fraction of reduction operations. Although, in general, multimedia applications are characterized for having high amounts of Data Level Parallelism, reductions and accumulations are dificult to parallelize and show a poor tolerance to increases in the latency of the instructions. This is specially signifcantfor p-SIMD extensions such a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000